107 research outputs found

    A survey on privacy in human mobility

    Get PDF
    In the last years we have witnessed a pervasive use of location-aware technologies such as vehicular GPS-enabled devices, RFID based tools, mobile phones, etc which generate collection and storing of a large amount of human mobility data. The powerful of this data has been recognized by both the scientific community and the industrial worlds. Human mobility data can be used for different scopes such as urban traffic management, urban planning, urban pollution estimation, etc. Unfortunately, data describing human mobility is sensitive, because people's whereabouts may allow re-identification of individuals in a de-identified database and the access to the places visited by indi-viduals may enable the inference of sensitive information such as religious belief, sexual preferences, health conditions, and so on. The literature reports many approaches aimed at overcoming privacy issues in mobility data, thus in this survey we discuss the advancements on privacy-preserving mo-bility data publishing. We first describe the adversarial attack and privacy models typically taken into consideration for mobility data, then we present frameworks for the privacy risk assessment and finally, we discuss three main categories of privacy-preserving strategies: methods based on anonymization of mobility data, methods based on the differential privacy models and methods which protect privacy by exploiting generative models for synthetic trajectory generation

    Privacy by Design in Data Mining

    Get PDF
    Privacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Many recent research works have focused on the study of privacy protection: some of these studies aim at individual privacy, i.e., the protection of sensitive individual data, while others aim at corporate privacy, i.e., the protection of strategic information at organization level. Unfortunately, it is in- creasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze complex data which describes human activities in great detail and resolution. As a result anonymization simply cannot be accomplished by de-identification. In the last few years, several techniques for creating anonymous or obfuscated versions of data sets have been proposed, which essentially aim to find an acceptable trade-off between data privacy on the one hand and data utility on the other. So far, the common result obtained is that no general method exists which is capable of both dealing with “generic personal data” and preserving “generic analytical results”. In this thesis we propose the design of technological frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of data mining technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technol- ogy by design, so that the analysis incorporates the relevant privacy requirements from the start. Therefore, we propose the privacy-by-design paradigm that sheds a new light on the study of privacy protection: once specific assumptions are made about the sensitive data and the target mining queries that are to be answered with the data, it is conceivable to design a framework to: a) transform the source data into an anonymous version with a quantifiable privacy guarantee, and b) guarantee that the target mining queries can be answered correctly using the transformed data instead of the original ones. This thesis investigates on two new research issues which arise in modern Data Mining and Data Privacy: individual privacy protection in data publishing while preserving specific data mining analysis, and corporate privacy protection in data mining outsourcing

    Local Rule-Based Explanations of Black Box Decision Systems

    Get PDF
    The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts. %Therefore, we need explanations that reveals the reasons why a predictor takes a certain decision. In this paper we focus on the problem of black box outcome explanation, i.e., explaining the reasons of the decision taken on a specific instance. We propose LORE, an agnostic method able to provide interpretable and faithful explanations. LORE first leans a local interpretable predictor on a synthetic neighborhood generated by a genetic algorithm. Then it derives from the logic of the local interpretable predictor a meaningful explanation consisting of: a decision rule, which explains the reasons of the decision; and a set of counterfactual rules, suggesting the changes in the instance's features that lead to a different outcome. Wide experiments show that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box

    Unveiling mobility complexity through complex network analysis

    Get PDF
    The availability of massive digital traces of individuals is offering a series of novel insights on the understanding of patterns characterizing human mobility. Many studies try to semantically enrich mobility data with annotations about human activities. However, these approaches either focus on places with high frequencies (e.g., home and work), or relay on background knowledge (e.g., public available points of interest). In this paper, we depart from the concept of frequency and we focus on a high level representation of mobility using network analytics. The visits of each driver to each systematic destination are modeled as links in a bipartite network where a set of nodes represents drivers and the other set represents places. We extract such network from two real datasets of human mobility based, respectively, on GPS and GSM data. We introduce the concept of mobility complexity of drivers and places as a ranking analysis over the nodes of these networks. In addition, by means of community discovery analysis, we differentiate subgroups of drivers and places according both to their homogeneity and to their mobility complexity

    Foundations of Multidimensional Network Analysis

    Full text link
    Abstract—Complex networks have been receiving increasing attention by the scientific community, thanks also to the increas-ing availability of real-world network data. In the last years, the multidimensional nature of many real world networks has been pointed out, i.e. many networks containing multiple connections between any pair of nodes have been analyzed. Despite the importance of analyzing this kind of networks was recognized by previous works, a complete framework for multidimensional network analysis is still missing. Such a framework would enable the analysts to study different phenomena, that can be either the generalization to the multidimensional setting of what happens in monodimensional network, or a new class of phenomena induced by the additional degree of complexity that multidimensionality provides in real networks. The aim of this paper is then to give the basis for multidimensional network analysis: we develop a solid repertoire of basic concepts and analytical measures, which takes into account the general structure of multidimensional networks. We tested our framework on a real world multidimensional network, showing the validity and the meaningfulness of the measures introduced, that are able to extract important, non-random, information about complex phenomena. I

    A survey of methods for explaining black box models

    Get PDF
    In recent years, many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness, sometimes at the cost of sacrificing accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, it explicitly or implicitly delineates its own definition of interpretability and explanation. The aim of this article is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation, this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective

    Meaningful Explanations of Black Box AI Decision Systems

    Get PDF
    Black box AI systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases inherited by the algorithms from human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We focus on the urgent open challenge of how to construct meaningful explanations of opaque AI/ML systems, introducing the local-to-global framework for black box explanation, articulated along three lines: (i) the language for expressing explanations in terms of logic rules, with statistical and causal interpretation; (ii) the inference of local explanations for revealing the decision rationale for a specific case, by auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of many local explanations into simple global ones, with algorithms that optimize for quality and comprehensibility. We argue that the local-first approach opens the door to a wide variety of alternative solutions along different dimensions: a variety of data sources (relational, text, images, etc.), a variety of learning problems (multi-label classification, regression, scoring, ranking), a variety of languages for expressing meaningful explanations, a variety of means to audit a black box
    • …
    corecore